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ABSTRACT 

Social hierarchy (i.e., pyramid structure of societies) is a funda¬ 
mental concept in sociology and social network analysis. The im¬ 
portance of social hierarchy in a social network is that the topo¬ 
logical structure of the social hierarchy is essential in both shaping 
the nature of social interactions between individuals and unfolding 
the structure of the social networks. The social hierarchy found in 
a social network can be utilized to improve the accuracy of link 
prediction, provide better query results, rank web pages, and study 
information flow and spread in complex networks. In this paper, 
we model a social network as a directed graph G, and consider the 
social hierarchy as DAG (directed acyclic graph) of G, denoted as 
Gd- By DAG, all the vertices in G can be partitioned into differ¬ 
ent levels, the vertices at the same level represent a disjoint group 
in the social hierarchy, and all the edges in DAG follow one direc¬ 
tion. The main issue we study in this paper is how to find DAG 
Gd in G. The approach we take is to find Gd by removing all 
possible cycles from G such that G — U{G) U Gd where H{G) 
is a maximum Eulerian subgraph which contains all possible cy¬ 
cles. We give the reasons for doing so, investigate the properties of 
Gd found, and discuss the applications. In addition, we develop a 
novel two-phase algorithm, called Greedy-&-Refine, which greed¬ 
ily computes an Eulerian subgraph and then refines this greedy so¬ 
lution to find the maximum Eulerian subgraph. We give a bound 
between the greedy solution and the optimal. The quality of our 
greedy approach is high. We conduct comprehensive experimental 
studies over 14 real-world datasets. The results show that our algo¬ 
rithms are at least two orders of magnitude faster than the baseline 
algorithm. 

1. INTRODUCTION 

Social hierarchy refers to the pyramid structure of societies, with 
minority on the top and majority at the bottom, which is a prevalent 
and universal feature in organizations. Social hierarchy is also rec¬ 
ognized as a fundamental characteristic of social interactions, being 
well studied in both sociology and psychology m- In recent years, 
social hierarchy has attracted considerable attention and generates 
profound and lasting influence in various fields, especially social 
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networks. This is because the hierarchical structure of a popula¬ 
tion is essential in shaping the nature of social interactions between 
individuals and unfolding the structure of underlying social net¬ 
works. Gould in El develops a formal theoretical model to model 
the emergence of social hierarchy, which can accurately predict the 
network structure. By the social status theory in El, individu¬ 
als with low status typically follow individuals with high status. 
Clauset et al. in (§1 develop a technique to infer hierarchical struc¬ 
ture of a social network based on the degree of relatedness between 
individuals. They show that the hierarchical structure can explain 
and reproduce some commonly observed topological properties of 
networks and can also be utilized to predict missing links in net¬ 
works. Assuming that underlying hierarchy is the primary factor 
guiding social interactions, Maiya and Berger-Wolf in 1221 infer 
social hierarchy from undirected weighted social networks based 
on maximum likelihood. All these studies imply that social hier¬ 
archy is a primary organizing principle of social networks, capable 
of shedding light on many phenomena. In addition, social hierar¬ 
chy is also used in many aspects of social network analysis and 
data mining. Eor instance, social hierarchy can be utilized to im¬ 
prove the accuracy of link prediction El, provide better query re¬ 
sults El, rank web pages 021, and study information flow and 
spread in complex networks [Hill. 

In this paper, we focus on social networks that can be modeled by 
directed graphs, because in many social networks (e.g., Google-l-, 
Weibo, Twitter) information flow and influence propagate follow 
certain directions from vertices to vertices. Given a social network 
as a directed graph G, its social hierarchy can be represented as a 
directed acyclic graph (DAG). By DAG, all the vertices in G are 
partitioned into different levels (disjoint groups), and all the edges 
in the cycle-free DAG follow one direction, as observed in social 
networks that prestige users at high levels are followed by users 
at low levels and the prestige users typically do not follow their 
followers. Here, a level in DAG represents the status of a vertex in 
the hierarchy the DAG represents. 

The issue we study in this paper is how to find hierarchy as a 
DAG in a general directed graph G which represents a social net¬ 
work. Given a graph G, there are many possible ways to obtain a 
DAG. First, converting graph G into a DAG, by contracting all ver¬ 
tices in a strongly connected component in G as a vertex in DAG, 
does not serve the purpose, because all vertices in a strongly con¬ 
nected component do not necessarily belong to the same level in a 
hierarchy. Second, a random DAG does not serve the purpose, be¬ 
cause it heavily relies on the way to select the vertices as the start 
to traverse and the way to traverse. Therefore, two random DAGs 
can be significantly different topologically. Third, finding the maxi¬ 
mum DAG of G is not only NP-hard but also NP-approximate m. 
The way we do is to find the DAG by removing all possible cycles 


1 


from G following iBl . In CD Gupte et al. propose a way to de¬ 
compose a directed graph G into a maximum Eulerian subgraph 
U{G) and DAG Gd, such that G — U(G) U Go- Here, all possi¬ 
ble cycles in G are in U{G), and all edges in Gd do not appear in 
U{G). We take the same approach to find DAG Gd for a graph G 
by finding the maximum Eulerian subgraph U(G) of G such that 
G = U{G) U Gd, as given in lT3l . 

Main contributions; We summarize the main contributions of our 
work as follows. First, unlike (H) which studies a measure be¬ 
tween 0 and 1 to indicate how close a given directed graph is to 
a perfect hierarchy, we focus on the hierarchy (DAG). In addition 
to the properties investigated in GD, we show that Gd found is 
representative, exhibits the pyramid rank distribution. In addition, 
Gd found can be used to study social mobility and recover hidden 
directions of social relationships. Here, social mobility is a fun¬ 
damental concept in sociology, economics and politics, and refers 
to the movement of individuals from one status to another. Second, 
we significantly improve the efficiency of computing the maximum 
Eulerian subgraph W(G). Note that the time complexity of the BF- 
U algorithm Qa is 0(nm^), where n and m are the numbers of 
vertices and edges, respectively. Such an algorithm is impractical, 
because it can only work on small graphs. We propose a new algo¬ 
rithm with time complexity O(m^), and propose a novel two-phase 
algorithm, called Greedy-ife-Refine, which greedily computes an 
Eulerian subgraph in 0{n + m) and then refines this greedy solu¬ 
tion to find the maximum Eulerian subgraph in O(cm^) where c 
is a very small constant less than 1. The quality of our greedy ap¬ 
proach is high. Finally, we conduct extensive performance studies 
using 14 real-world datasets to evaluate our algorithms, and con¬ 
firm our findings. 

Further related work: Ball and Newman © analyze directed 
networks between students with both reciprocated and unrecipro¬ 
cated friendships and develop a maximum-likelihood method to 
infer ranks between students such that most unreciprocated friend¬ 
ships are from lower-ranked individuals to higher-ranked ones, cor¬ 
responding to status theory CD. Leskovec et al. in I19II18I inves¬ 
tigate signed networks and develop an alternate theory of status 
in replace of the balance theory frequently used in undirected and 
unsigned networks to both explain edge signs observed and predict 
edge signs unknown. Influence has been widely studied © , finding 
social hierarchy provides a new perspective to explore the influence 
given the existence of a social hierarchy. 

Eulerian graphs have been well studied in the theory community 
©[TilDiTlIlol. For example, in in , Fleischner gives a comprehen¬ 
sive survey on this topic. In HOI , the same author surveys several 
applications of Eulerian graphs in graph theory. Another closely 
related concept is super-Eulerian graph, which contains a spanning 
Eulerian subgraph Hi HI Ho), here a spanning Eulerian subgraph 
means an Eulerian subgraph that includes all vertices. The prob¬ 
lem of determining whether or not a graph is super-Eulerian is NP- 
complete Q. Most of these work mainly focus on the properties of 
Eulerian subgraphs. There are no much related work on comput¬ 
ing the maximum Eulerian subgraphs for large graphs. To the best 
of our knowledge, the only one in the literature is done by Gupte, 
et al. in GD- However, the time complexity of their algorithm is 
O(nm^), which is clearly impractical for large graphs. 

Organization: In Section|D we focus on the properties of the so¬ 
cial hierarchy found after giving some useful concepts on maxi¬ 
mum Eulerian subgraph, and discuss the applications. In Section[3 
we discuss an existing algorithm BF-U GD In Section|4l we pro¬ 
pose a new algorithm DS-U of time complexity O(m^), and treat 
it as the baseline algorithm. We present a new two-phase algorithm 



Figure 1: Illustration of the maximum Eulerian subgraph 

GR-U for finding the maximum Eulerian subgraph, as well as its 
analysis in Section |5] Extensive experimental studies are reported 
in Section© Finally, we conclude this work in Section|7] 

2. THE HIERARCHY 

Consider an unweighted directed graph G = {V,E), where 
V (G) and E{G) denote the sets of vertices and directed edges of 
G, respectively. We use n = |H(G)| and m — |i5(G)| to de¬ 
note the number of vertices and edges of graph G, respectively. 
In G, a path p = {vi,V 2 , ■ ■ ■ ,Vk) represents a sequence of edges 
such that {vi,Vi+i) £ E{G), for each Vi (1 < i < k). The 
length of path p, denoted as len(p), is the number of edges in 
p. A simple path is a path (ui ,V 2 , - ■ ■ ,Vk) with k distinct ver¬ 
tices. A cycle is a path where a same vertex appears more than 
once, and a simple cycle is a path (wi, V 2 , ■ ■ ■ , Vk-i,Vk) where 
the first fc — 1 vertices are distinct while Vk = vi. For simplic¬ 
ity, below, we use V and E to denote V'(G) and E{G) of G, 
respectively, when they are obvious. For a vertex Vi G V{G), 
the in-neighbors of Vi, denoted as Ni(vi), are the vertices that 
link to Vi, i.e., Ni{vi) = {vj \ {vj,Vi) G E{G)}, and the out- 
neighbors of Vi, denoted as No{vi), are the vertices that Vi links 
to, i.e., No{vi) = {vj I {vi,Vj) G E{G)}. The in-degree di{vi) 
and out-degree do{vi) of vertex Vi are the numbers of edges that 
direct to and from Vi, respectively, i.e., di{vi) = |W/(t;i)| and 
do{vi) = |Wo(wi)l- 

A strongly connected component (SCC) is a maximal subgraph 
of a directed graph in which every pair of vertices Vi and Vj are 
reachable from each other. 

A directed graph G is an Eulerian graph (or simply Eulerian) if 
for every vertex Vi G V{G), di{vi) = do{vi). An Eulerian graph 
can be either connected or disconnected. An Eulerian subgraph 
of a graph G is a subgraph of G, which is Eulerian, denoted as 
Gu- The maximum Eulerian subgraph of a graph G is an Eulerian 
subgraph with the maximum number of edges, denoted as IA{G). 
Given a directed graph G, we focus on the problem of finding its 
maximum Eulerian subgraph, U{G), which does not need to be 
connected. Note that the problem of finding the maximum Eulerian 
subgraph (U{G)) in a directed graph can be solved in polynomial 
time, whereas the problem of finding the maximum connected Eu¬ 
lerian subgraph is NP-hard (4) The following example illustrates 
the concept of maximum Eulerian subgraph. 

Example 2.1: Fig. G] shows a graph G = {V, E) with 14 ver¬ 
tices and 22 edges. Its maximum Eulerian subgraph U(G) is a 
subgraph of G, where its edges are in solid lines: E{U{G)) = 
{(t^l,W2), {v2,Va), (t^4,U3), (t’S.'Us), (tts.ttl), {VA,V(i), {vq,Va), 
{V3, ve), (V6, V 3 ), (V6, Vs), (vs,Vll), (vil,Vl 2 ), (vi 2 , V 13 ), (vi3, 
Via), {via, Vj), (vj, ve)}, and V (W(G)) is the set of vertices that 
appear in E{U{G)). 

The main issue here is to find a hierarchy of a directed graph G 
as DAG Gd by finding the maximum Eulerian subgraph (./(G) for 
a directed graph G. Witht/(G) found, Gd can be efficiently found 
due to G = iY(G) UGd, and E{h((G))nE{GD) = 0. We discuss 
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the properties of the hierarchy Gd and the applications. 

The representativeness: The maximum Eulerian subgraph W(G) 
for a general graph G is not unique. A natural question is how 
representative Go is as the hierarchy. Note that Gd is only unique 
w.r.t U{G) found. Below, we show Gd identified by an arbitrary 
U{G) is representative based on a notion of strictly-higher defined 
between two vertices in Gd, over a ranking r(-) where r{u) < 
r(v) for each edge (u,v) G Gd- Here, for two vertices u and 
V, a larger rank implies a vertex is in a higher status in a follower 
relationship, and u is strictly-higher than v if r{u) > r{v) and u is 
reachable from v, i.e. there is a directed path from u to u in Gd- 

Theorem 2.1: Let Gdi and Gd^ be two DAGs for G such that 
G = Lh{G) U Gdi = U 2 {G) U Gdj- There are no vertices u 
and V such that u is strictly-higher than v in Gdi whereas v is 
strictly-higher than u in Gd^- 

Proof Sketch: Assume the opposite. We can construct an auxiliary 
graph G' = G U {(m, u)}. Then finding the maximum Eulerian 
subgraph for G' can be done in two steps. In the first step, find the 
maximum Eulerian subgraph U{G), and in the second step, find 
the maximum Eulerian subgraph for G plus the additional edge 
(u,v)- Since G = Ui{G) U Gdi = IT2{G) U Gdj. there are 
supposed to be at least two corresponding relaxing orders, when 
the first phase terminates, namely, identifying ifi(G) and U 2 {G)- 
Eor one relaxing order, we can show that the added edge [u, v) can 
be relaxed, which results in finding W(G^) such that \E{IA{G'))\ > 
\E{U{G))\- For the other relaxing order, we can also show that 
the added edge (u, w) cannot be relaxed and U{G') = IA{G)- It 
leads to a contradiction, because it can find two different maximum 
Eulerian subgraphs for G' with different sizes. 

Alternatively, let the ranking in Gdi and Gd 2 be ri(-) and 
r 2 (-). Assume there are two vertices u and v such that u is strictly- 
higher than V by n whereas v is strictly-higher than u by r 2 . We 
prove this cannot achieve based on the finding in HU. In oa, it 
gives a total score on G which measures how G is different from 
DAG Gd based on a ranking r(-). The total score, denoted as 
A{G, r), is obtained by summing up the weights assigned to edges, 
max{r{u) — r{v)-\-l, 0} for edge (u, v). The finding in 1131 is that 
the minimum total score equals to the number of edges in the max¬ 
imum Eulerian subgraph, minr{A(G,r)} = \E{U{G))\. Choose 
ri and r 2 satisfying that A{G, ri) = \E{JAi{G))\ and A{G, r 2 ) = 
|£'(W 2 (G))|. Since u is strictly-higher than v in Gdi, there is a 
directed path from u to u in Gdi- We can construct an auxiliary 
graph G' = GU {(u,u)}, then \E(l{(G'}}\ > |£:(Wi(G))|. On 
the other hand, over the same G^ since v is strictly-higher than u in 
Gd 2 . we can show \E{lA{G'y)\ < A{G',r 2 ) ~ A(G,r 2 ), which 
leads to a contradiction. □ 

A case study: With the hierarchy (DAG Gd) found, suppose we 
assign every vertex u a minimum non-negative rank r{u) such that 
r{u) < r(v) for any edge (u,v) € Gd, where r(-) is a strictly- 
higher rank. To show whether such ranking reflects the ground 
truth, as a case study, we conduct testing using Twitter, where 
the celebrities are known, for instance, refer to Twitter Top 100 
(http : //twittercounter . com/pages/lOOI. We sample 
a subgraph among 41.7 million users (vertices) and 1.47 billion re¬ 
lationships (edges) from Twitter social graph G crawled in 2009 QSl 
In brief, we randomly sample 5 vertices in the celebrity set given 
in Twitter, and then sample 1,000,000 vertices starting from the 5 
vertices as seeds using random walk sampling GD. We construct 
an induced subgraph G' of the 1,000,000 vertices sampled from 
G, and we uniformly sample about 10,000,000 edges from G' to 
obtain the sample graph G, which contains 759,105 vertices and 
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Figure 2: Rank Distribution 


Graph 

TiT 

IBI 



GplusO 

100,000 

115,090 

2,833 

6,271 

Gplusl 

100,000 

512,281 

14,797 

70,537 

Gplus2 

100,000 

2,867,781 

51,605 

770,854 

Gplus3 

100,000 

8,289,203 

87,941 

3.644,147 

WeiboO 

100,000 

2,431,525 

96,765 

850,136 

Weibo 1 

100,000 

2,446,002 

96,833 

855,131 

Weibo2 

100,000 

2,463,050 

96,902 

861,729 

Weibo3 

100,000 

2,479,140 

96,969 

868,044 


Table 1: To study social mobility 


11,331,061 edges. In G, we label a vertex u as a celebrity, if u is 
a celebrity and has at least 100,000 followers in G. There are 430 
celebrities in G including Britney Spears, Oprah Winfrey, Barack 
Obama, etc. We compute the hierarchy (Gd) of G using our ap¬ 
proach and rank vertices in Gd- The hierarchy reflects the truth: 
88% celebrities are in the top 1% vertices and 95% celebrities in 
the top 2% vertices. In consideration of efficiency, we can approxi¬ 
mate the exact hierarchy with a greedy solution obtained by Greedy 
in Section[5] In the approximate hierarchy, 85% celebrities are in 
the top 1% vertices and 93% celebrities in the top 2% vertices. 

The pyramid structure of rank distribution is one of the most 
fundamental characteristics of social hierarchy. We test the social 
networks: wiki-Vote, Epinions, Slashdot0902, Pokec, Google-l-, 
Weibo. The details about the datasets are in Table Q] and Table [3 
The rank distribution derived from hierarchy G d , shown in Fig. |2(a)| 
indicates the existence of pyramid structure, while the rank distri¬ 
butions derived from a random DAG (Fig. |2(b)^ and by contract¬ 
ing SCCs (Fig. |2(c)| l are rather random. Here, the x-axis is the 
rank where a high rank means a high status, and the y-axis is the 
percentage in a rank over all vertices. By analyzing the vertices, 
u, in G over the difference between in-degree and out-degree, i.e. 
di{u) — do{u), it reflects the fact that those vertices u with neg¬ 
ative di{u) — doiu) are always at the bottom of Gd, whereas 
those vertices in the higher rank are typically with large positive 
di{u) — do{u) values. 

The social mobility: With the DAG Gd found, we can further 
study social mobility over the social hierarchy Gd represents. Here, 
social mobility is a fundamental concept in sociology, economics 
and politics, and refers to the movement of individuals from one 
status to another. It is important to identify individuals who jump 
from a low status (a level in Gd) to a high status (a level in Gd). 
We conduct experimental studies using the social network Google-l- 
(http : / /plus . google . comi crawled from Jul. 2011 to Oct. 
2011 (27l[^, and Sina Weibo (http : / /weibo . com) crawled 
from 28 Sep. 2012 to 29 Oct. 2012 1241 . For Google-l- and Weibo, 
we randomly extract 100,000 vertices respectively, and then extract 
all edges among these vertices in 4 time intervals during the period 
the datasets are crawled, as shown in Table [T] 

We show social mobility in Fig.[^ We compare two snapshots, 
Gi and G2, and investigate the social mobility from Gi to G 2 - For 
Google-l-, Gi and G2 are GplusO and Gplusl, and for Weibo, Gi 
and G2 are WeiboO and Weibo 1. For Gi, we divide all vertices into 
5 equal groups. The top 20% go into group 5, and the second 20% 
go to group 4, for example. In Fig.[3 the x-axis shows the 5 groups 
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(a) Google+ (exact) 
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Status in snapshot 1 


(b) Weibo (exact) 







Status in snapshot 1 

(c) Google+ (approx) 



1 2 3 4 5 

Status in snapshot 1 


(d) Weibo (approx) 


_Figure 3: Social mobility result from hierarchy_ 

Algorithm 1 BF-U (G) 

Input: A graph G = (V, E) 

Output: Two subgraphs of G, U{G) and Gu {G = U(G) U Go) 


1: w{vi,Vj) i -1 for each edge (vi,Vj) S E; 

2: while there is a negative cycle pc in G do 
3: for every edge {vi , Vj ) in the negative cycle pc do 

4: w{vi,Vj) i - w{vi,Vj)\ 

5: Reverse the direction of the edge {vi , Vj ) to be 

6: end for 

7: end while 

8: Go is a subgraph that contains all edges with weight -1; 

9: U{G) is a subgraph containing the reversed edges with weight +1; 


for Gi. Consider the number of vertices in a group as 100%. In 
Fig. [3 we show the percentage of vertices in one group moves to 
another group in G 2 . Fig. |3(a)| and Fig. |3(b)| show the results for 
Google+ and Weibo. Some observations can be made. Google+ 
is a new social network when crawled since it starts from Jun. 29, 
2011, and Weibo is a rather mature social network since it starts 
from Aug. 14, 2009. From Fig. |3(a)[ man y vertices move from one 
status to another, whereas from Fig. |3(b)[ only a very small number 
of vertices move from one status to another. Similar results can 
be observed from approximate hierarchies, by our greedy solution 
Greedy given in Section]^ as shown in Fig. [^c) and Fig. [3jd). 
Those moved to/from the highest level deserve to be investigated. 

Recovering the hidden directions is to identify the direction of 
an edge if the direction of the edge is unknown iia The direc¬ 
tionality of edges in social networks being recovered is important 
in many social analysis tasks. We show that our approach has ad¬ 
vantage over the semi-supervised approach (SM-ReDirect) in HD. 
Here, the task is using the given 20% directed edges as training 
data to recover the directions for the remaining edges. In our ap¬ 
proach, we construct a graph G from the 20% training data, and 
identify Go by G = W(G) U Go- With the ranking r(-) over 
the vertices, we predict the direction of an edge {u, v) is from u 
to V if r{v) > r{u). Take Slashdot and Epinion datasets used 
da, our approach outperforms the matrix-factorization based SM- 
ReDirect both in terms of accuracy and efficiency. For Slashdot, 
our prediction accuracy is 0.7759 whereas SM-ReDirect is 0.6529. 
For Epinion, ours is 0.8285 whereas SM-Redirect is 0.7118. Us¬ 
ing approximate hierarchy, our accuracy is 0.7682 for Slashdot and 
0.8277 for Epinion, respectively. 

3. THE EXISTING ALGORITHM 


Algorithm 2 DS-U(G) 

Input: A graph G = E) 

Output: Two subgraphs of G, U{G) and Go {G = U{G) U Go) 

1 : for each edge {vi, Vj) in E{G) do w{vi,Vj) < -1; 

2: for each vertex u in V(G) do dst{u) •<— 0, relax{u) •<— true, 
pos{u) •<— 0; 

3: while there is a vertex u £ V(G) such that relax(u) = true do 
4: Sv ^0,Se ^ 0, NV 0; 

5: if FindNC {G, u) then 

6: while ^v-topO / NV do 

7: 5v.pop(); {vi,Vj) <r- Ss.popQ; 

8 : w{vi,vj) < - w(vi,Vj)', 

9 : Reverse the direction of the edge {vi , vj ) to be {vj ,Vi); 

10: end while 

11: Sv-popO; {vi,Vj) ^B-popO; 

12 : w{vi,Vj) < - w(vi,vj); 

13: Reverse the direction of the edge {vi, Vj) to be (vj , 

14: end if 

15: end while 

16: Go is a subgraph that contains all edges with weight -1; 

17: U{G) is a subgraph containing the reversed edges with weight +1; 


Algorithm 3 FindNC (G, u) 

1: 5v-push(?x); 

2: for each edge (u, v) starting at pos{u) in E{G) do 
3: pos{u) pos{u) + 1; 

4: if dst{u) + w{u^v) < dst{v) then 

5: dst{v) •<— dst{u) + w{u, v)', 

6: relax{v) true, pos(v) <— 0; 

7: if r; is not in Sy then 

8: S'£;.push((u, t?)); 

9: if FindNC {G, v) then return true; endif 

10: else 

11: S'£;.push((u, r;)); A’V <—v; return true; 

12: endif 

13: endif 

14: end for 

15: relax{u) ■<— false; 

16: 5\A.pop(); Se- popO if Se is not empty; return false; 


To find the maximum Eulerian subgraph, Gupte, et al. in on 
propose an iterative algorithm based on the Bellman-Ford algo¬ 
rithm, which we call BF-U (Algorithm |T]l. Let 'w{vi,Vj) be a 
weight assigned to an edge (vi,Vj) in G. Initially, BF-U assigns an 
edge-weight with a value of -1 to every edge in graph G (Line 1). 
Let a negative cycle be a cycle with a negative sum of edge weights. 
In every iteration (Lines 2-7), BF-U finds a negative cycle pc re¬ 
peatedly until there are no negative cycles. Eor every edge (vi, Vj) 
in the negative cycle pc found, it changes the weight of (vi,Vj) 
to be —w{vi, Vj) and reverses the direction of the edge (Lines 4- 
5). As a result, it finds a maximum Eulerian subgraph U{G) and 
a directed acyclic graph (DAG) of G, denoted as Go, such that 
G = U{G) U Go- Since the number of edges with weight -l-l 
increases by at least one during each iteration, there are at most 
0(m) iterations (Line 2-7). In every iteration it has to invoke the 
Bellman-Ford algorithm to find a negative cycle (Line 2), or to de¬ 
termine whether there is a negative cycle. The time complexity of 
Bellman-Ford algorithm is 0{nm). Therefore, in the worst case, 
the total time complexity of BF-U is O(nm^), which is too expen¬ 
sive for real-world graphs. 

4. A NEW ALGORITHM 

To address the scalability problem of BF-U, we propose a new al¬ 
gorithm, called DS-U- Different from BF-U which starts by finding 
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a negative cycle using the Bellman-Ford algorithm in every itera¬ 
tion, DS-U finds a negative cycle only when necessary with condi¬ 
tion. In brief, in every iteration, when necessary, DS-U invokes an 
algorithm FindNC (short for find a negative cycle) to find a nega¬ 
tive cycle while relaxing vertices following DFS order. Applying 
amortized analysis (23) , we prove the time complexity of DS-U, is 
0{m^) to find the maximum Eulerian subgraph (f(G). 

The DS-U algorithm is outlined in Algorithm|2l which invokes 
FindNC (Algorithm [3]l to find a negative cycle. Here, FindNC 
is designed based on the same idea of relaxing edges as used in 
the Bellman-Ford algorithm. In addition to edge weight w{vi,Vj), 
we use three variables for every vertex u, relax{u), pos{u), and 
dst{u). Here, relax{u) is a Boolean variable indicating whether 
there are out-going edges from u that may need to relax to find a 
negative cycle. It will try to relax an edge from u further when 
relax{u) = true. When relaxing from u, pos{u) records the next 
vertex v in No{u) (maintained as an adjacent list) for the edge 
(u, v) to be relaxed next. It means that all edges from u to any 
vertex before pos{u) has already been relaxed. dst{u) is an es¬ 
timation on the vertex u which decreases when relaxing. When 
dst{u) decreases, relax{u) is reset to be true and pos{u) is re¬ 
set to be 0, since all its out-going edges can be possibly relaxed 
again. Initially, in DS-U, every edge weight w{vi,Vj) is initialized 
to -1, and the three variables, relax{u), pos{u), and dst(u), on 
every vertex u are initialized to true, 0, and 0, respectively. All 
w{vi,Vj), relaxiu), pos{u), and dst{u) are used in FindNC to 
find a negative cycle following the main idea of Bellman-Ford al¬ 
gorithm in DFS order. A negative cycle, found by FindNC while 
relaxing edges, is maintained using a vertex stack Sv and an edge 
stack Se together with a variable NV, where NV maintains the 
first vertex of a negative cycle. In DS-U, by popping vertex/edges 
from Sv/Se until encountering the vertex in NV, a negative cycle 
can be recovered. As shown in Algorithmic in the while state¬ 
ment (Lines 3-15), for every vertex u'mV (G), only when there is 
a possible relax (relax{u} = true) and there is a negative cycle 
found by the algorithm FindNC, it will reverse the edge direction 
and update the edge weight, w{vi,Vj), for each edge (vi,Vj) in the 
negative cycle (Lines 6-13). 



Figure 4: A subgraph GofFig-Hl 

Example 4.1: We explain DS-U (Algorithmic using an example 
graph G in Fig.|4] For ever vertex u, there is an adjacent list to 
maintain its out-neighbors. Initially, for every vertex u, dst{u) = 
0,relax{u) — true,pos{u) = 0; and for every edge (vi,Vj), 
w(vi, Vj) = —1. Suppose we process ve, vs, vi,V 2 ,V 8 in such an 
order. In the first iteration, relaxive) = true and FindNC (G, ue) 
returns true, which implies a negative cycle is found. Here, for all 
vertices in G, we have dsf(u6) = Q,relax{vs) = true,pos{v&) = 
1; dst{vs) = —4:,relax{vs) = true,pos{vs) = 0; dst{v\) = 
— 2,relax{vi) = true,pos{vi) — 0;dst(v2) = —3,relax(v2) = 
true,pos(v 2 ) = 0; dst{vs) = 0,relax{vs) = true,pos{vs) = 
0. In addition, NV = vs, Sv = {v 6 ,V 3 ,vi,V 2 } , Se = {("yeiVs), 
{v3,vi), {vi,V 2 ), {v 2 , ns)}. Following Lines 6-13 in Algorithmic 
we find and reverse negative cycle (ns, ni, n 2 , ns) and make tnjns, 
V 2 ) = uj(v 2 ,vi) = tn(ni,ns) = 1. In the second iteration, the out- 
neighbors of Ve are relaxed from pos(ve) = 1 in ve’s adjacent list, 
i.e. from edge (ve,vs). FindNC (G,ve) returns false. We have, 
dst{ve) — 0,relax(ve) = false,pos(ve) = 2, and dsfjns) = 


— l,relax{v8) = false,posivg,) = 1. In the following iterations 
(FindNC (G, ns), FindNC (G, ni), and FindNC (G, V 2 )), all return 
false. Finally, for vertex ns, since relax{v8) — false, FindNC 
(G, Vs) is unnecessary, and DS-U (G) terminates. It finds the max¬ 
imum Eulerian subgraph T((G) = {(ns, ni), (nr, n 2 ), (v 2 , ns)}. 

Lemma 4.1: In Algorithm^ if there is a negative cycle G, relax{u) 
= true holds for at least one vertex u £V (G). 

Proof Sketch: Assume the opposite, i.e., there exists a negative 
cycle G such that for every vertex u £ V{C), relax(u) = false. 
LetG = {vo,vi,... ,Vk-i,Vk = no), then tn(ni, ni+r) < 

0. Since relax{vi) = false holds for i = 0,1,.. ., fc — 1, then 
dst{vi+i) < dst{vi) -f w(vi,Vi+i). It leads to a contradiction, 
if summing both sides from i = 0toz = fc — 1, then dst{vo) = 
dst{vk) < dst{vo) -F YliZo uj{vi,Vi+i) < dst{vo). □ 

Theorem 4.1: Algorithm^correctly finds the maximum Eulerian 
subgraph hi{G) when it terminates. 

Proof Sketch: It can be proved by Lemma lTTI □ 

Lemma 4.2: Given an Eulerian graph G, when DS-U (G) termi¬ 
nates, for each vertex u, dst{u) £ [—2m, 0], where m = |U(G)|. 

Proof Sketch: We do mathematical induction on the maximum 
number of cycles the Eulerian graph G contains. 

1. If G contains only one cycle, i.e. G is a simple cycle itself, 
it is easy to see that for each vertex u, dst{u) > —m £ 
[-2m, 0]. 

2. Assume Lemma \4^ holds when G contains no more than 
k cycles, we prove it also holds when G contains at most 
fc -I- 1 cycles. We first decompose G into a simple cycle G 
which is the last negative cycle found during DS-U (G) and 
the remaining is an Eulerian graph G' containing at most fc 
cycles. We explain the validation of this decomposition as 
follows. If the last negative cycle found contains some pos¬ 
itive edges, then the resulting maximum Eulerian subgraph 
hi{G) will contain some negative edges, it is against the fact 
that G itself is given as Eulerian. Next, we decompose DS-U 
(G) into two phases, it finds G' as an Eulerian subgraph in 
the first phase while cycle G is identified in the second phase. 
According to the assumption, when the first phase completes, 
for each vertex u £ G',dst{u) G [—2|D(G')| — |i5(G)|,0], 
where —2|U(G')| is by DS-U(G') and — |i5(G)| is the result 
by relaxing G. There are two cases for the second phase. 

(a) If V{C)f]V(^G') = 0, the two phases are indepen¬ 
dent. Therefore, when the second phase terminates, for 
each vertex u £ V{C), dst{u) £ [—2|i?(G)|, 0], and 
for each vertex u £ V{G'), dst{u) £ [—2|U(G^)|, 0], 
Lemma l4.2l holds. 

(b) If V{G) n V{G') / 0, suppose w £ V{G) fl U(G'), 
then dsf(u;) £ [—2|U(G')| — |U(G)|, 0] when the first 
phase completes. During the second phase, dst{w) 
decreases by |U(G)|, then dst{w) > —2|f?(G')| — 
2|£:(G)| G [-2m,0]. For any vertex v £ H(G') \ 

V (G), dst{v) can only change along a path p = (wo,wi, 
... ,Wk-i,Wk = v), where«)o £ V{C) and {wi,Wi-i) 
£ E{G'), for i = 1,..., fc. Then d{v) = d{wo) + 

w(wi,mi+i) > d(uio) > -2m G [-2m, Oj. 
Therefore, Lemmaholds. 

□ 
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Figure 5: Example graph to explain 2(b) in Lemma l4^ 

Example 4.2: We explain the proof of 2(b) in Lemma |4^ using 
Fig. [5] Fig.H] shows an Eulerian graph G containing simple cy¬ 
cles. Suppose the first negative cycle found is Ci = (wi ,V 2 ,V 3 ,V 4 , 
vi), with the resulting dst{vi) = —4, dst{v 2 ) = —l,dst{v 3 ) = 

— 2, dstivi) = —3. In a similar way, suppose the second negative 
cycle found is C 2 = (ui, W 5 , ue, V 3 , V 2 ,vi) by relaxing dst(v-i) = 
— 5 ,dst{v 5 ) = —5,dst{v6) = —G,dst{v 3 ) = — 7 ,dst{v 2 ) = 
—6, and the third negative cycle is C 3 = {v 3 ,V 7 ,vs,vi,V 4 ,,V 3 ) 
with dsf(tii) = —10,cisf(n4) = — 9 ,dst{v 3 ) = — 8 ,dst{v 7 ) = 
—8, dst{vs) = —9. By reversing these three negative cycles, we 
have a cycle C = (wi, V 2 , U 3 , V 4 ), and a graph G' which is a simple 
cycle with (vi,V 5 ,V 6 , U 3 , U 7 , us, Ui). As can be seen, the current 
min{cisf(u)} = dst(vi) = —10 is in the range of [—2|E(G')| — 
|£'(C')|,0]. When DS-U(G) terminates, min{dsf(u)|M G V^(G)} 

= dst{vi) = —14isin therageof [—2|i?(G)|, 0], andmin{dsf(u)| 
u G I^(G')\I4(G)} = dst{vs) = —ISisintherangeof [—2|i5(G)|, 
0]. It shows that Lemma |4^ holds for this example. 

Lemma 4.3: Given a general graph G, when DS-U(G) terminates, 
for each vertex u, dst{u) G [—4m,0], where m = |_E(G)|. 

Proof Sketch: For a general graph G, we can add edges {u, v) from 
vertices u with di{u) > do{u) to vertices v with di{v) < doiv) 
and {u,v) ^ E(G). Obviously, the resulting augment graph G^ 
has at most 2 m edges. 

Based on Lemma |4^ when DS-U (G^) terminates, each ver¬ 
tex u satisfies d{u) G [—4m, 0]. On the other hand, DS-U (G^) 
can be decomposed into two phases, DS-U (G) and further relax¬ 
ations exploiting E(G^) \ E{G), implying that for each vertex u, 
dst{u) G [—4m, 0] holds when DS-U (G) terminates. □ 

Lemma 4.4: For each value of dst{u) of every vertex u, the out- 
neighbors of u, i.e. No (u), are relaxed at most once. 

Proof Sketch: As shown in Algorithm. [3] dst(u) is monotone de¬ 
creasing, and pos (m) is monotone increasing for a particular dst{u) 
value. So Lemma l4!4l holds. □ 

Theorem 4.2: Time complexity of DS-U (G) is 0{m^}. 

Proof Sketch: Given Lemma Lemma [43] and Lemma lA4l 
since every edge (w, v) is checked at most |cisf(M)| -f |cisf(ii)| < 
8 m times for relaxations. By applying amortized analysis 1231 . the 
time complexity of DS-U (G) is 0{vn?). □ 

Consider Algorithm [2] During each iteration of the while loop, 
only a small part of the graph can be traversed and most edges 
are visited at most twice. Therefore, each iteration can be approx¬ 
imately bounded as 0{m), and the time complexity of DS-U is 
approximated as 0{K ■ m), where K is the number of iterations, 
bounded by \E{U{G))\ < m. In the following discussion, we will 
analyze the time complexity of algorithms based on the number of 
iterations. 

5. THE OPTIMAL: GREEDY-&-REFINE 

DS-U reduces the time complexity of BF-U to but it is 

still very slow for large graphs. To further reduce the running time 
of DS-U, we propose a new two-phase algorithm which is shown 
to be two orders of magnitude faster than DS-U. Below, we first in¬ 


troduce an important observation which can be used to prune many 
unpromising edges. Then, we will present our new algorithms as 
well as theoretical analysis. 

Let iS be a set of strongly connected components (SCCs) of G, 
such that S — {Gi, G2, • ■ • }, where Gi is an SCC of G, Gi C G, 
and Gi n Gj = 0 for i 7 ^^ j. We show that for any edge, if it is 
not included in any SCC Gi of G, then it cannot be contained in 
the maximum Eulerian subgraph W(G). Therefore, the problem of 
finding the maximum Eulerian subgraph of G becomes a problem 
of finding the maximum Eulerian subgraph of each Gi G S, since 
the union of the maximum Eulerian subgraph of Gi G <S, 1 < f < 
|5|, is the maximum Eulerian subgraph of G. 

Lemma 5.1: An Eulerian graph G can be divided into several edge 
disjoint simple cycles. 

Proof Sketch: It can be proved if there is a process that we can 
repeatedly remove edges from a cycle found in an Eulerian graph 
G, and G has no edges after the last cycle being removed. Note that 
di{u) = do{u) for every u in G. Let a subgraph of G, denoted as 
Gc, be such a cycle found in G. Gc is an Eulerian subgraph, and 
G © Gc is also an Eulerian subgraph. The lemma is established. □ 

Theorem 5.1: Let G be a directed graph, and S = {Gi, G2, • • • } 
be a set of SCCs of G. The maximum Eulerian subgraph of G, 
W(G)=Ug,65W(G0- 

Proof Sketch: For each edge e = (w, v) G U{G), there is at least 
one cycle containing this edge, given by Lemma BTI Therefore, u 
and V belong to the same SCC, i.e., for any edge e' G G — <S, it 
cannot be included in W(G). The theorem is established. □ 

Below, we discuss how to find the maximum Eulerian subgraph 
for each strongly connected component (SCC) Gi of G. In the 
following discussion, we assume that a graph G is an SCC itself. 

We can use DS-I/to find the maximum Eulerian subgraph for an 
SCC G. However, DS-U is still too expensive to deal with large 
graphs. The key issue is that the number of iterations in DS-U 
(Algorithmic Lines 3-15), can be very large when the graph and 
its maximum Eulerian subgraph are both very large. Since in most 
iterations, the number of edges with weight -l-l increases only by 1 , 
and thus it takes almost |W(G) | iterations to get the optimal number 
of edges in the maximum Eulerian subgraph U(G). 

In order to reduce the number of iterations, we propose a two- 
phase Greedy-&-Refine algorithm, abbreviated by GR-U. Here, a 
Greedy algorithm computes an Eulerian subgraph of G, denoted as 
U{G), and a Refine algorithm refines the greedy solution (T(G) to 
get the maximum Eulerian subgraph U{G), which needs at most 
\E{U(G))\ — \E{U{G))\ iterations. The GR-U algorithm is given 
in Algorithm|4l and an overview is shown in Fig.|C In Algorithm|4l 
it first computes all SCCs (Line 1). For each SCC Gi, it computes 
an Eulerian subgraph using Greedy, denoted as U{Gi) (Line 3). In 
Greedy, in every iteration I (1 < I < Imax), it identifies a sub¬ 
graph by an I-Subgraph algorithm, and further deletes/reverses all 
specific length-i paths called pn-paths which we will discuss in 
details by DFS. Note Imax is a small number. After computing 
U{Gi), Gi —lA{Gi) is near acyclic, and it moves all cycles from 
Gi — lA{Gi) to ZT(Gi) (Line 4). Finally, it refines lA{Gi) to ob¬ 
tain the optimal W(Gi) by calling Refine (Line 5). The union of all 
U{Gi) is the maximum Eulerian subgraph for G. Below, we first 
list some important concepts introduced in the algorithm and anal¬ 
ysis parts in Table. [2] and then we shall detail the greedy algorithm 
and refine algorithm, respectively. 

5.1 The Greedy Algorithms 

Given a graph G, we propose two algorithms to obtain an ini- 
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Used-fn 

Symbol 

Meaning 

Greedy 

label(ii) 

label(u) = do{u) — di{u) 

pn-path {u, v) 

path(ii = izi, 172, • • • ,vi = v), label(tij > U, label(iz) < (J, and label(izj) = U tor 1 < i < 1 

G'/ 

(^-Subgraph) subgraph ot G contains all pn-paths ot length 1 

G'J 

V(G^ ) = y (G), E(G^ ) = {(«, ^) 1 (V, u) G E(G )} 

level (tz) 

the shortest distance trom any vertex u with a positive label, label(u) > U, in G 

rievel (iz) 

the shortest distance to any vertex u with a negative label, label('u) < (J, in G 

Refine 

Analysis 

G 

V(G) = V(G),f/(vi,Vj) G E(G), (vj,vi) G E(G),andw(vj,Vi) = —w(vi,vj) 

p-path/n-path 

a path where every edge is with a positive/negative weight 

k-cycle 

(izj*", , tzj",..., where (v'^ , v~) are n-paths, and (v~ , plus ) are p-paths 


the total weight of n-edges for a k-cycle (A^ = w(v'T ^ 

K 

the total weight of p-edges for a k-cycle (AJ. = fc-i ’'DZ 

Q 

g = Gp ffiGjv.Gp = GeU(G) andGiv = Gel4(G) 


Table 2: Notations 


Algorithm 4 GR-U (G) 

1: Compute SCCs of G, iS = {Gi, G 2 , ■■■ }; 

2: for^ach Gi G <S do 
3: U{Gi) ^ Greedy (Gi); 

4: Move all cycles foundin Gi—W(Gi) to W(Gi); {Make Gi—W(Gi) 

acyclic) 

5: U(Gi) <- Refine (W(Gi), Gi); 

6: end for 

7: return Ui=iW(Gi); 


— Greedy- 
0{n + m) 


- Refine 




~ I 0{crrr),c 1 
U{G) ^ U^G ) 


' 1 ' 2 ' \ ' L 

delete/reverse pnpatlis of length I 


one iteration 


1-Subgra.ph 

DFS 1 

0{n + m) \ 

0{n + m) 1 


Figure 6: An Overview of Greedy-&-Refine 

tial Eulerian subgraph U{G). The first algorithm is denoted as 
Greedy-D (Algorithm [3, which deletes edges from G to make 
di{v) = do{v) for every vertex v in U{G). The second algo¬ 
rithm is denoted as Greedy-R (Algorithm]^, which reverses edges 
instead of deletion to the same purpose. We use Greedy when we 
refer to either of these two algorithms. By definition, the result¬ 
ing 14(G) is an Eulerian subgraph of G. The more edges we have 
in 14{G), the closer the resulting subgraph 14{G) is to 14(G). We 
discuss some notations below 

The vertex label: For each vertex u in G, we define a vertex label 
on u, label(w) = do{u) — di{u). If label(M) = 0, it means that u 
can be a vertex in an Eulerian subgraph without any modifications. 
If label(u) 7 ^ 0, it needs to delete/reverse some adjacent edges to 
make label(u) being zero. 

The pn-path: We also define a positive-start and negative-end path 
between two vertices, u and v, denoted as pn-path {u,v). Here, 
pn-path (u, v) is a path p = {vi,V2, ■ ■ ■ ,vi), where u = vi and 
V = vi with the following conditions: Iabel(t6) > 0, label(w) < 0, 
and all label(tii) = 0 for 1 < t < (. Clearly, by this defini¬ 
tion, if we delete all the edges in pn-path (u,v), then label(u) 
decreases by 1, label(t;) increases by 1, and all intermediate ver¬ 
tices in pn-path (u, v) will have their labels as zero. To make all 
vertex labels being zero, the total number of such pn-paths to be 
deleted/reversed is N = X]iabei(u)>o l3bel(M). 

The transportation graph G^\ A transportation graph of G is 
a graph such that HjG^) = H(G) and i5(G^) = {(m, f) | iv,u) G 
E{G)}. 


Algorithm 5 Greedy-D (G) 

1: 1 <- 1; G' ^ G; 

2: while some vertex u a G' with label(n) > 0 do 
3: G'^PN-path-D{G',l)-,l 

4: end while 
5: return G'; 


Algorithm 6 PN-path-D (G, 1) 

1: G( I-Subgraph {G, ly, 

2: Enqueue all vertices n G F(Gj) with labeljtr) > 0 into queue Q; 

3: while Q 7 ^ 0 do 
4: ti t— Q.topO; 

5: Following DFS starting from u over Gi, traverse unvisited edges 

and mark them “visited”; let the path from u to 1 ; be pn-path (u, v), 
when it reaches the first vertex v in Gi with level (ti) = Z; 

6: if pn-path (u, u) 7 ^ 0 then 

7: delete all edges in pn-path (u, v) from G; 

8: label(tj) t— label(ti) — 1; label(t)) <— labeljti) -|- 1; 

9: if label(ti) = 0 then Q. dequeue)); 

10: else 

11: Q.dequeued; 

12: end if 

13: end while 
14: return G; 


The level and rievel: level (v) is the shortest distance from any 
vertex u with a positive label, label(u) > 0, in G. rievel(w) 
is the shortest distance from any vertex u with a positive label, 
label(M) > 0, in G^. Note rlevelju) is the shortest distance to 
any vertex u with a negative label, label(u) < 0, in G. 

5.1.1 The Greedy-D Algorithm 

Below, we first concentrate on Greedy-D (Algorithm]^. Let G' 
be G (Line 1). In the while loop (Lines 2-4), it repeatedly deletes 
all pn-paths starting from length Z = 1 by calling an algorithm PN- 
path-D (Algorithm]^ until no vertex u in G' with a positive value 
(label(u) > 0). 

Example 5.1: Consider graph G in Fig.|7] Three vertices, V2, V4, 
and Vs, in double cycles, have a label -l-l, and three other ver¬ 
tices, ui. Vs, and V7, in dashed cycles, have a label-l. Initially, 
Z = 1, Greedy-D (Algorithm O deletes pn-path {v 2 ,vs), making 
Iabel(ii 2 ) = label(t! 3 ) = 0. When Z = 2, pn-path (i!4,t;i) = 
(ti 4 ,ti 3 ,ui) is deleted. Finally, when I = 5, pn-path (vsjVr) = 
(vs, fii, U 12 , W 13 , fi 4 , W 7 ) will be deleted. In Fig.|7] the graph with 
solid edges is 14(G) or the graph G' returned by Algorithmic It is 
worth mentioning that for the same graph G, DS-U needs 10 itera¬ 
tions. From the Eulerian subgraph Z7(G) obtain by Greedy, it only 
needs at most 2 additional iterations to get the maximum Eulerian 
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Figure 7: An Eulerian subgraph obtained by Greedy for graph 
G in Fig.[T] 




Figure 8: BFS-Trees used for constructing l-Subgraph 
subgraph. 

It is worth noting that U{G) is not optimal. Some edges in W(G) 
may not be in the maximum Eulerian subgraph, while some edges 
deleted should appear in the maximum Eulerian subgraph. In next 
section, we will discuss how to obtain Jbe maximum Eulerian sub¬ 
graph W(G) from the greedy solution U{G). 

Finding all pn-paths with length 1: The PN-path-D algorithm is 
shown in Algorithm]^ In brief, for a given graph G, PN-path- 
D first extracts a subgraph Gi G G which contains all pn-paths 
of length I that are possible to be deleted from G by calling an 
algorithm l-Subgraph (Algorithm|7]l in Line 1. In other words, all 
edges in E{G) but not in E{Gi) cannot appear in any pn-paths 
with a length < 1. Based on Gi obtained, PN-path-D deletes pn- 
paths from G (not from Gi) with additional conditions (in Lines 2- 
13). Let G'l be a subgraph of Gi that includes all edges appearing 
in pn-paths of length I to be deleted in PN-path-D. PN-path-D will 
return a subgraph G \ Gj as a subgraph of G, which will be used in 
the next run in Greedy-D for deleting pn-paths with length I -\- 1. 

We discuss the l-Subgraph algorithm (Algorithm |7J, which ex¬ 
tracts Gi from G by BPS (breadth-first-search) traversing G twice. 
In the first BPS (Lines 4-6), it adds a virtual vertex s, and adds an 
edge (s, u) to every vertex u with a positive label (label(M) > 0) 
in G. Then, it assigns a level to every vertex in G as follows. Let 
level(s) be —1. By BPS, it assigns level(u) to be level (parent(u))-|- 
1, where parent(M) is the parent vertex of u following BPS. In 
the second BPS (Lines 7-10), it conceptually considers the trans¬ 
position graph G^ of G by reversing every edge {v,u) £ E{G) 
as {u,v) € E{G"'^) (Line 7). Then, it assigns a different rievel 
to every vertex in G using the transposition graph G^. Like the 
first BPS, it adds a virtual vertex t, and adds an edge {t, u) to ev¬ 
ery vertex u with a negative label (label(u) < 0) in G^. Then, 
it assigns rievel to every vertex in G^ as follows. Let rievel (f) 
be —1. By BPS, it assigns rlevel(u) to be rievel(parent(u)) + 1, 
where parent(M) is the parent vertex of u in G^ following BPS. 
The resulting subgraph Gi to be returned from l-Subgraph is ex¬ 
tracted as follows. Here, V{Gi) contains all vertices m in G if 
level(M) + rlevel(u) = I for the given length I, and E{Gi) con¬ 
tains all edges (u, v) if both u and v appear in V(Gi), (u, v) is an 
edge in the given graph G, and level(u) + 1 = level(n) (Lines 11- 
13). The following example illustrates how l-Subgraph algorithm 
works. 

Example 5.2: Eig.l^illustrates the Gi returned by l-Subgraph (Al- 



Figure 9: An l-Subgraph for length I = 2 

Algorithm 7 l-Subgraph (G, 1) 

1 : for each vertex u in V(G) do 

2: level(ri) oo, rlevel(tt) <— oo; 

3: end for 

4: Add a virtual vertex s and an edge (s, u) from s to every vertex u in G 
if label(M) > 0; 

5: level(s) t-1; 

6: level(ri) <— level(parent(tt)) + 1 for all vertices u in G following 
BPS staring from s; 

7: Construct a graph G^ where V{G^) = V(G) and E{G^) = 
{(«, u)|(i;, n) e E{G)}-, 

8: Add a virtual vertex t and an edge {t, u) from s to every vertex u in 
G^ if label(u) < 0; 

9: rlevel{f) <-1; 

10: rlevel(n) <— rlevel(parent(n)) -|-1 for all vertices u in G^ following 
BPS staring from t; 

11: Extract a subgraph G;; 

12: V{Gi) = {u I level{«) -I- rlevel(u) = /}; 

13: E{Gi) = {(u,t;) | u G V{Gi),v G V{Gi),iu,v) G E{G), 
level(ti) -h 1 = level(t;)}; 

14: return Gi; 


gorithm|7} when Z = 2. It is constructed using two BPS, i.e., BPS 
(G, s) and BPS (G^,t), and the associated BFS-trees with level 
< 2 and rievel < 2 are shown in Fig. |8(a)| and Fig. |8(b)[ respec¬ 
tively. In Fig. |8(a)[ ver tices vi and v^ are the only vertices with 
label < 0. In Fig. |8(b)| vertex V 4 is the only one with label > 0. 
Therefore, Gi contains only four edges, in dashed lines, which is 
much smaller than the original graph G to be handled. 

Lemma 5.2: By l-Subgraph, the resulting subgraph Gi includes all 
pn-pathv of length I in G. 

Proof Sketch: Recall that l-Subgraph returns a graph Gi where 
V{Gi) = {u I level(M)-|-rlevel(u) = Z}andi?(Gi) = {(w, v) | u G 
V(G,),v G V{Gi),(u,v) G E{G), level(u) -f 1 = level(t;)}. It 
implies the following. All vertices in Gi are on at least one shortest 
path from a positive label vertex u (label(u) > 0) to a negative la¬ 
bel vertex v (label(u) < 0) of length 1. All edges are on such short¬ 
est paths. No any edge in a pn-path of length I will be excluded 
from Gi. In other words, there does not exist an edge {u , v') on 
pn-path {u, v) of length I, which does not appear in E{Gi). □ 
We explain PN-path-D (Algorithm 0. Based on Gi obtained 
from G using l-Subgraph (Algorithm [Til, in PN-path-D, we delete 
all possible pn-paths of length I from G (Lines 2-13). The deletion 
of all pn-paths of length Z from the given graph G is done using 
DPS over Gi with a queue Q. It first pushes all vertices uinV (Gi) 
with a positive label (label(u) > 0) into queue Q, because they 
are the starting vertices of all pn-paths with length Z. We check 
the vertex u on the top of queue Q. With the vertex u, we do DPS 
starting from u over Gi, traverse unvisited edges in Gi, and mark 
the edges visited as “visited”. Let p be the first pn-path [u, v) with 
length Z along DPS. We delete all edges on p, and adjust the labels 
as to reduce label(«) by 1 and increase label(t!) by 1. We dequeue 
u from queue Q until we cannot find any more pn-paths of length Z 
starting from u, i.e. p returned by DPS (u) is empty. It is important 
to note that we only visit each edge at most once. There are two 
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Algorithm 8 Greedy-R (G) 

1 : I-1^1; 

2: Assign an initial value of —1 to the weight w{vi,Vj) for every edge 
{vi,Vj) e -E; 

3: while some vertex u a G with label (it) > 0 do 

4: G <— PN-path-R (G, 1); {PN-path-R is the same as PN-patb-D (Al¬ 

gorithmic except that in Algorithmic ^ changed to be “re¬ 
verse all edges in pn-patbs (it, v) in G, both weights and direc¬ 
tions” ) 

5: Z •(— Z -b 1; 

6: end while 

7: Remove edges {vi,Vj)fmmGifw{vi,Vj) = -|-1; 

8: return G; 


cases. One is that the edges visited will be deleted and there is no 
need to revisit. The other is that they are marked as “visited” but 
not included in any pn-paths with length 1. For this case, these 
edges will not appear in any other pn-paths starting from any other 
vertices. 

Lemma 5.3: By PN-path-D, all pn-path.s of length I are deleted. 

Proof Sketch: It can be proved based on DFS over Gi obtained 
from l-Subgraph. 

Lemma 5.4: By PN-path-D, the resulting G does not include any 
pn-pathj of length < 1 . 

Proof Sketch: Let G' be the resulting graph of PN-path-D after 
deleting all pn-paths of length i from G. It is trivial when i = 1. 
Assume that it holds for Gj when i < 1. We prove that Gj holds 
when i = Z. First, there are no pn-paths of length < Z — 1 in graph 
Gi_i as a result of PN-path-D by assumption. Second, G| C G'i_i 
because G'l is obtained by deleting pn-paths of length Z from G'i_i, 
as given in the Greedy-D algorithm (Algorithm [Cl. Furthermore, 
in PN-path-D, every vertex u with label(M) = 0 in Gi_i keeps 
label(M) = 0 in Gj. If there is a pn-path {u, v) of length < Z — 1 
found in G|, then it must be in Gj_i, which contradicts the assump¬ 
tion. Therefore, Gj does not include any pn-paths of length < Z. 

□ 

Theorem 5.2: The PN-path-D algorithm correctly identifies a sub¬ 
graph Gi which contains all pn-pathj of length I and returns a 
graph includes no pn-pathx of length < 1 . 

Proof Sketch: It can be proved by Lemma lS^ and Lemma lSAl 
We discuss the time complexity of the Greedy-D algorithm. In 
our experiments, we show that more than 99.99% pn-paths deleted 
in most real-world datasets are with a length less than or equal to 
6. We take the maximum length Imax in the Greedy-D algorithm, 
which is equivalent to the iterations of calling PN-path-D, as a con¬ 
stant, since it is always less than 100 in our extensive experiments. 
Here, both PN-path-D and l-Subgraph cost 0{n + m), because Z- 
Subgraph invokes BPS twice and PN-path-D performs DFS once 
in addition. Given Imax as a constant, the time complexity of the 
Greedy-D algorithm is 0(n + m). 

5.1.2 The Greedy-R Algorithm 
The Greedy-R algorithm is shown in Algorithmic Like Greedy- 
D, Greedy-R will result in an Eulerian subgraph. Unlike Greedy-D, 
it reverses the edges on pn-paths of length Z from Z = 1 until there 
does not exist a vertex u in G with label(u) > 0. Initially, Greedy- 
R assigns every edge, {vi,Vj), in G with a weight w{vi,Vj) = —1. 
Then, in the while loop, it calls PN-path-R. PN-path-R is the same 
as PN-path-D (Algorithm |C except that in Algorithm |C Line 7 
is changed to be “reverse all edges in pn-path (u, v) in G, both 
weights and directions”. As a result, Greedy-R identifies an Eule- 



Figure 10: An example to explain PN-path-R. 

rian subgraph of G,U{G). Here, E{JA{G)) contains all edges with 
a weight = — 1 and V{U{G)) contains all the vertices mE{U{G)). 
Below, we give two lemmas to prove the correctness of Greedy-R. 

Lemma 5.5: By PN-path-R, the resulting G does not include any 
pn-pathx of length < 1. 

Proof Sketch: Let G'i be the resulting graph of PN-path-R (G, i), 
i.e. after reversing all pn-paths of length i from G. It is triv¬ 
ial when z = 1. Assume that it holds for G'i when i < 1. We 
prove that G'i holds when i = 1. Otherwise, suppose that there is 
a pn-path {y\,V 2 ) of length < Z in Gj, then there exists at least 
one edge e = (v,v') in pn-path (viyvf) that has been reversed 
during PN-path-R (G, Z), otherwise, pn-path (m , V 2 ) will be fully 
included in Gj_i. Without loss of generality, assume that edge 
(v', n) is a part of pn-path (mi, 112 ) = (ui, n', n, M 2 ) of length Z. 
Eig.ftbl shows G'i_i (before calling PN-path-R {G,l)). Then, we 
can easily construct pn-path (mi, M2) = {ui,v',V 2 ) and pn-path 
(mi,M 2 ) = (mi,m,M 2 ), and at least one of them is of length < Z — 1, 
contradicting the assumption. In addition, there can not exist any 
pn-path of length Z in Gj. As a consequence, by PN-path-R, the 
resulting G does not include any pn-paths of length < Z. □ 

Similar to Theorem. 15.21 PN-path-R algorithm correctly iden¬ 
tifies a subgraph Gi which contains all pn-paths of length I and 
returns a graph includes no pn-paths of length < Z. 

Theorem 5.3: The PN-path-R algorithm correctly identifies a sub¬ 
graph Gi which contains all pn-pathx of length I and returns a 
graph includes no pn-path.s of length < 1. 

We omit the proof of Theorem. 15.31 since it can be proved in a 
similar manner like Theorem. 15 . 2l using Lemma 153 ] 




(a) G' returned by Greedy-D (b) G returned by Greedy-R 

Figure 11: ld(G) returned by Greedy-D and Greedy-R 
It is worth noticing that U{G) obtained by Greedy-R is at least 
as good as that obtained by Greedy-D. If each edge in G — ZT(G) 
is reversed once, then the 11(G) obtained by Greedy-R is equiva¬ 
lent to that obtained by Greedy-D, as each edge appears in at most 
one pn-path. On the other hand, if there are some edges being re¬ 
versed more than once, Greedy-R performs better. Fig. shows 
the difference between Greedy-D and Greedy-R. Since pn-paths of 
length 1 and 2 are the same, we only show the last deleted/reversed 
pn-path. In Fig. [TT(a)| we delete pn-path (vsyVr) = (vs,vii,vi 2 , 
mi3, mi4, M7). On the other hand, in Fig. |l 1(b)] we reverse pn-path 
{vs,vr)j= (m8,mio,M3,M4,M9,M7). Here edge (m4,M3) is reversed 
twice, hi (G) returned by Greedy-R consists of solid lines, which is 
better than that returned by Greedy-D. 

5.2 The Refine Algorithm 

With the greedy Eulerian subgraph 11 (G) found, we have insight 
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Algorithm 9 Refine (U(G), G) 

Input: A graph G, and the Eulerian subgraph obtained by Greedy, IA{G) 
Output: Two subgraphs of G, U{G) and Go (G = U{G) U Go) 

1: for each edge {vi, Vj) in E{G) do 
2: if (ui, Uj) G W(G) then 

3: reverse the edge to be {vj , Ui) in G; w{vj ,Vi)i -hi; 

4: else 

5: w{vi,vj) < -1; 

6: end if 

7: end for 

8: Assign dst{u) for every u V (G) based on Eq. (T); 

9: for each vertex uinV (G) do relax{u) <— true, pos(u) 0; 

10: Enqueue every vertex uinV (G) into a queue Q; 

11: u t— Q.frontO; 

12: while Q 0 do 

13: Sv 0, NV y- 0; 

14: if relax{u) = true and FindNC (G, u) then 

15: Reverse negative cycle and change the edge weights using Sy 

and Se (refer to Algorithm |2); 

16: Q -4— Q U Sy', 

17: else 

18: Q-popO; t— Q.frontO; 

19: end if 

20: end while 

21: Go is a subgraph that contains all edges with a weight of -1; 

22: U{G) is a subgraph that contains the edges reversed for all edges with 
a weight of+1; 


on G because we know G — U{G) U Go where Go is a DAG 
(acyclic), and can design a Refine algorithm based on such insight, 
to reduce the number of times to update dst{u), which reduces the 
cost of relaxing. The Refine algorithm (Algorithm]^ is designed 
based on the similar idea given in DS-U using FindNC with two 
following enhancements. 

First, we utilize G = If (G) U Go to initialize the edge weight 
'w{vi,Vj) for every edge {vi,Vj) and dst(u) for every vertex u 
in G. The edge weights are initialized in Line 1-7 in Algorithmic 
based on W(G) which is a greedy Eulerian subgraph. We also make 
use of Go to initialize dst(u) based on Eq. ([TJ in Line 8. 

( 0 if di(u) in Gd is 0, 

min{dst(u) — iK^ju) € G^} u G Go, 

0 otherwise 

( 1 ) 

Some comments on the initialization are made below. Eollowing 
Algorithmic dst{u) can be initialized as dst{u) = 0. In fact, 
consider Lemma 1441 No matter what dst(vi) is for a vertex Vi 
(1 < i < fc — 1) in a negative cycle G = {v\,V 2 , ■ ■ ■ ,Vk = ni), 
the negative cycle can be identified because there is at least one 
edge {vi,Vi+i) that can be relaxed. Based on it, if we initialize 
dst{u) in a way such that dst(u) < dst{v) -I- w{v,u), then u 
cannot be relaxed through {v, u) before updating dst{v). It reduces 
the number of times to update dst{u), and improves the efficiency. 
We explain it further. Because for any edge, (v, u) G Go, u can 
never be relaxed through edge {v, u) before dst{v) being updated, 
FindNC (G, u) will relax edges along a path with a few branches 
to identify a negative-cycle. The variables such as relax{u) and 
pos{u) are initialized in Line 9 as done in Algorithmic 

Second, we use a queue Q to maintain candidate vertices, u, 
from which there may exist negative-cycles, if relax{u) = true. 
Initially, all vertices are enqueued into Q. In each iteration, when 
invoking FindNC (G, v), let V' be the set of vertices relaxed. Among 


V', for any vertex w G Sy \ {n}, dst{w) has been updated and 
it has only relaxed partial out-neighbors when finding the negative 
cycle. On the other hand, for any vertex w a V' \ Sy, all of the 
out-neighbors of w have been relaxed and cannot be relaxed before 
updating dst{w). We exclude w £ V' \ Sy from Q implicitly by 
setting relax(w) = false in FindNC (G, w). 

Example 5.3: Suppose we have a greedy Eulerian subgraph If (G) 
(Eig.lCof G(Eig.[D by Greedy-D, and will refine it to the opti¬ 
mal lf{G) using Refine. Initially, all edges (solid lines) in lf{G) 
are reversed with initial -fl edge weight, and all remaining edges in 
Go are initialized with -1 edge weight. dst{vi) = —2, dst{v3) = 
— l,dst{v 7 ) = — 5 ,dst{vii) — —l,dst{vi2) — — 2 ,dst{vi 3 ) = 
—3, dst{vi4) = —4, and other vertices u have dst{u) = 0. In 
the while loop, FindNC (G, vi) relaxes dst{v5) = —1 and returns 
false. This makes relax(y\) = relaxing) = false by which vi 
and Vs are dequeued from Q. Afterwards, none of V 2 ,V 3 ,V 4 , Ve 
can relax any out-neighbors, and all are dequeued from Q. FindNC 
(G, vt) relaxes all vertices, finds a negative cycle (dt, V 9 ,V 4 ,V 2 ,V 3 , 
uio, Vs, ttii, ni 2 , wis,'fi4, w), and adds V2,V3,V4 into Q as new 
candidates. Then, no vertices from va to vi4 can relax any out- 
neighbors until FindNC (G, V2) finds the last negative cycle (v2,V4, 
t>3,V2). For most cases, FindNC (G, u) relaxes a few of u’s out- 
neighbors. 

We discuss the time complexity of Refine. The initialization 
(Lines 1-9) is 0 {n -f m). Since lf{G) approximates U{G), the 
number of negative-cycles found by Refine will be no more than 
\E{IA{G))\ — \E{hl{G))\, and vertices u will have dst{u) updated 
less than \E{Ll{G))\ — \E{hl{G))\ times. This implies the while 
loop costs 0 {\E{JA{G))\ — \E{IA{G))\ ■ m). Time complexity of 
Refine is 0 {cm?), where c 1, as confirmed in our testing. 


5.3 The Bound between Greedy and Optimal 

We discuss the bound between U{G) obtained by Greedy and 
the maximum Eulerian subgraph U{G). To simplify our discus¬ 
sion, below, a graph G is a graph with multiple edges between two 
vertices but without self loops, and every edge {vi, Vj) is associated 
with a weight w(vi, Vj), which is initialized to be -1. Given a graph 
G, we use G to represent the reversed graph of G such that V{G) = 
V{G) and E{G) contains every edge {vj, Vi) if {vi, Vj) G E{G), 
and w(vj,Vi) = —w(vi,Vj). In addition, we use two operations, 
© and 0, for two graphs Gi and Gj. Here, Gij = Gi ® Gj is an 
operation that constructs a new graph Gij by union of two graphs, 
Gi and Gj, such that V{Gij) = V{Gi) U V{Gj), and E{Gij) = 
E{Gi)\JE{Gj). AndG' = GiQGj is an operation that constructs 
a new graph G' by removing a subgraph Gj from Gi (Gj C Gi) 
such that V{G') = ^(Gi) and EiG') = E{Gi) \ E{Gj). Given 
two Eulerian subgraphs, Gi and Gj , it is easy to show that Gi © Gj 
and Gi © Gj are still Eulerian graphs. Given any graph G, G © G 
is an Eulerian graph. Note that assume that there is a cycle with 
two edges, {vi,Vj) and {vj,Vi), between two vertices, Vi and Vj, 
in G. there will be four edges in G © G, i.e., two edges are from G 
and two corresponding reversed edges from G. 

We discuss the bound using an Eulerian graph Q — Gp © Gat, 
where Gp = GQlf{G) and Gjv = GQIA{G). We call every edge 
in Gn a negative edge (n-edge), and a path in G]v a negative path 
(n-path). We also call every edge in Gp a positive edge (p-edge), 
and a path in Gp a positive path (p-path). It is important to note 
that p-edges are given for Gp but not for Gp, all n-edges are with 
a weight of -1, while all p-edges are with a weight of +1, because 
they are the reversed edges in Gp. Here, (J is a graph with multiple 
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Figure 13: Q = Gp © Gn where Gp = G 0 U[G) and Gn = 
GqU{G). 

edges between a pair of vertices. 

Example 5.4: Consider the example graph G in Fig.[T] The Eule- 
rian subgraph obtained by Greedy, i.e. W(G) is shown in Fig. |12(a^ 
It is worth noting that we make use of the resulting graph of Greedy- 
D, since that obtained by Greedy-R is actually the maximum Eule- 
rian subgraph in this case. Eig. |12(c)| shows the maximum Eulerian 
subgraph U{G). As observed, some edges in G(G) do not appear 
in G(G), while some edges that do not appear in G(G) appear in 
U{G). Eig. |12(b^ and Fig. |12(d^ show Gp — G QU{G) and 
Gjv = G QU{G), respectively. Fig. [T^ shows Q — Gp © Gjv. 
In Fig.[T^ the solid edges represent the p-edges from Gp, and the 
dashed edges represent the n-edges from Gn- 

Since Q is Eulerian, it can be divided into several edge disjoint 
simple cycles as given by Lemma BTI Among these cycles, there 
are no cycles in Q with only n-edges, because they must be in G(G) 
if exist. And there are no cycles in Q with only p-edges, because all 
such cycles have been moved into lA{Gi) in GR-U (Algorithm [4] 
Line 4). 

Next, let a cycle be a positive-cycle if the total weight of the 
edges in this cycle > 0, and let it be a negative-cycle if its total 
weight of edges < 0. We show there are no negative-cycles in Q. 

Lemma 5.6: There does not exist a negative-cycle in Q. 

Proof Sketch: Assume there is a negative-cycle in Q, denoted as 
Gcyc- Since there are no cycle with only p-edges or n-edges, there 
are p-edges and n-edges in Gcyc- We divide Gcyc into two sub¬ 
graphs, Gp and Gn- Here Gp consists of all p-edges, where each 
p-edge is with a -l-l weight, and Gn consists of all n-edges, where 
each n-edge is with a -1 weight. Clearly, |i5(Gp)| < l^(G„)|. 
since it assumes that Gcyc is a negative-cycle. Note that G(G) 0 
Gp © Gn, which is equivalent to G(G) © Gcyc © (Gp © Gp), is 
Eulerian, and it contains more edges than G(G), resulting in a con¬ 



(b) 2-cycle 

Figure 14: k-cycle 








(a) Step-1 (b) Step-2 (c) Step-3 


Figure 15: k-cycle generated by Greedy-R 


tradiction. Therefore, there does not exist a negative-cycle in Q- 

□ 

Lemmashows all cycles in Q are non-negative. Since there 
are no cycles with only p-edges or n-edges, each cycle in Q can 
be partitioned into an alternating sequence of k p-paths and k 
n-paths, and represented as uj", uj, ..., uj), ujT, Ui"), where 
(ii+,w“), fori = 1 , 2 ,...,fc, are n-paths, and 
i = 2, - - -, k — 1, k, plus (wjT, v^) are p-paths. We call such 
cycle a k-cycle. Fig. 1 14(a)] shows an example of k-cycle, and an 
arrow presents a path, p-paths are in solid lines while n-paths are 
in dashed lines. _ 

_The difference \E{U{G))\ - \E{U{G))\ is equal to \E{G © 
W(G))| - |£;(G©W(G))| = \E{Gp)\- |E(Gjv)| = |E(GF)| - 
|E(Gjv)|, becomes the total number of edges in Gp minus the 
total number of edges in Gjv. On the other hand, the difference 
\E{hl{G)) \ — \E{hl{G)) \ can be considered as the total weight of 
all k-cycles in Q- Recall that all edges in G are with weight -1 and 
the edges in G are with weight -l-l by our definition. Assume that 
Q = {Gi, G 2 , • • • }, where Gi is a k-cycle. The total weight of 
Q regarding all k-cycles is w(Q) = w{Ci)- Below, we bound 
\E{U{G))\ - \E{U{G))\ using k-cycles. 

Consider Q in Fig. [T^ there are 3 k-cycles. Gi = (w 3 ,ui,W 3 ) 
and C 2 = (u 3 , W 2 , U 3 ) with weight 0, and G 3 = (us, un, U 12 , U 13 , 
Vi 4 , V 7 ,vg,V 4 , U 3 , Ulo, Us) with weight 2. This means that it needs 
at most 2 more iterations to get the maximum Eulerian subgraph 
from the greedy solution. 

Eor a k-cycle (uj*", uj", uJ,..., ,vZ, u]*"), we use A* and A(, 

to represent the total weight of n-edgeQ and p-edges, i.e. Ak = 
w(u+,u-) and a; = Ei=i,...,fe_i w(u-,u,+ i) + 
u;(wjr, uj*"). Because Ak is determined by the optimal in |E(Gjv)| 
= \E{G © U{G))\, the bound is obtained when getting the maxi¬ 
mum of A'k- 

Theorem 5.4: The upper bound of the total weight o/p-edge.s in a 
k-cycle with specific k is k times that of n-edges, i.e., A). < k - Ak 


Proof Sketch: The proof is based on the way k-cycles constructed 
by Greedy-R- For simplicity, we first assume that each n-path and 
p-path is a pn-path itself, and we will deal with general cases 
later. Based on Greedy-R, a k-cycle is constructed as shown in 
Fig.ini which is a 4-cycle. Initially, there are 4 n-paths, m = 

*For n-edges, we take the absolute value of total weight. 
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,v~),i = 1, 2,..., 4, as Fig. |15(a)| shows. Greedy-R deals 
with pn-paths of length I from a small I to a large 1. First, Greedy- 
R finds a path pi = pn-path (nj, wj”), and combines pi with two 
separated n-paths, ni and n2 into anew n-path n'l. Flere, len(pi) 
is no larger than any len(ni). Greedy-R will repeat this process 
to add all p-paths, pi into k-cycle in an ascending order of their 
lengths. The last p-path to he added to k-cycle should 

be the longest one among all p-paths. Then its upper bound is 
E,=i,...,fc w(t;+, v“) -k Ei=i,...,fe-i w(vf , i). Otherwise, its 

upper bound should be max{w(v~ Below, we prove The¬ 

orem. [511 

• For 2-cycle (Fig. [T4(b)| l: Since wj) < 'w(v^,Vi), 

< w(v^,vp, andw(v 2 ,v+) < w(v+,vi) -k 
w{vi,V2) w{v2 ,V2) we have, 

Aa = w{vi,V2)-\-w{v2,vt) 

< w{vt,Vi)-\- 2 -w{vi,vt)-\-w(v 2 ,V 2 ) 

< 2- A2 


• For 3-cycle (Fig. [T4(c)^ : Since 


w{V3,vt) 

< 

w{vt,V3] 

fw{v^,V 2 ),w{l 

’3,^3 ) 

W{V2,V^) 

< 

w{vt,V3] 

\ -k w{vp 

1 "nj) + 

w{vt,V2) 

W{V2,V^) 

< 


1 



w{V3,vt) 

< 

w{vt,V3] 

1 -k w{vp 

+ 

w{vt,V2) + 



W{v2,vt] 

1 -k w{v3 . 

,^ 3 ) 



we have, 

Ag = w{vi,V^) -yw{v2,V^) -\-w{V3,vt) 

< A3-\-2 ■ {w{vi ,vt)wiv^ ,vf)) 

< 2 ■ A 3 -\-3 ■ w{vi ,V 2 ) <3 ■ As 

• Assume that it holds for k-cycles when k < I, we prove 
that it also holds when k = 1. Suppose that the shortest p- 
path is (uf, uj), combine (uE t^r)’ (t^ri 
into a single p-path {vf ,V 2 ), then we get a k-cycle as fc = 

1 — 1. As a result, AJ. < (fc — 1) - {Ak + -k 

w{vi jv}) < k ■ Ak- □ 

Let A^ and Ac^ denote the total weight of p-edges and n- 
edges in a k-cycle Ci. Bounding \E{U{G))\- \E{U{G))\ can be 
formulated as an LP (linear programming) problem. 

max ^(AE-AcJ 

Ci 

s.t. (Cond-1) AE > Aci, Vt, 

(Cond-2) A'c^ < ki ■ Aci, for a k-cycle Gi with fc-value fci, 
(Cond-3) Y^{A'c, + Aci) < \E{g)\ < \E\ 

Ci 

In Fig. |16(a)| Bt at y-axis illustrates the theoretical upper bound 
of \E{U{G))\ - \E{U{G))\ = ^\E\ by solving the LP prob¬ 
lem, where the three solid lines represent the three conditions in the 
above LP problem, respectively. Here, K is the maximum among 
all k values. The theoretical upper bound is far from tight. First, 
\E{g)\ |i?|, which is a tighter upper bound of + 

Aci), moving Cond-3 towards the origin. Second, for most k- 
cycles, A'k = (1 -k e) • Ak, 0 < e < 1, since most p-paths in a 
k-cycle are far from the upper bound it can get. This leads Cond-2 
moving towards x-axis. Therefore, a tighter empirical upper bound 
is Bp at y-axis in Fig.|16(b)1 We will show it in the experiments. 



(a) Theoretical Upper Bound (b) Empirical Upper Bound 


Figure 16: Upper Bounds 



Figure 17: General p-paths 

We have proved Theorem 15.41 for the case p-paths and n-paths 
are pn-paths, which shows that each p-path in a k-cycle has an 
implicit upper bound. In general, there are cases where p-paths are 
not pn-paths. In fact, each p-path in a k-cycle can be classified 
into two classes, (a) A p-path is a part of a pn-path, including 
the case that the p-path is a pn-path. (b) A p-path can be di¬ 
vided into several sub-paths, each is a part of a pn-path. In Fig. EH 
there are three p-paths in the k-cycle, {vY,vf) is a part of pn- 
path ( 11 ^, 114 ), itself is pn-path and 

consists of two sub-paths: ( 11 ^, 115 ) and {v 5 ,v^), and each of them 
is a part of a pn-path or itself is a pn-path. 

For the cases when a p-path in a k-cycle is not a pn-path, we use 
Wp and Wu to denote its practical weight and the theoretical upper 
bound it can reach when itself is a pn-path, respectively. Since 
we concentrate on weight of p-paths, we treat such a p-path as a 
pn-path with weight Wp if Wp < Wu, and treat it as a pn-path 
with weight if Wp > Wu and add the difference Wp — Wu to a 
global variable W. We will show in Section[^that W is very small 
compared with \E{U{G))\. 

Time complexity: Revisit GR-U (Algorithm HJl, it includes four 
parts: SCC decomposition (Line 1), Greedy (Line 3), cycle moving 
(Line 4) and Refine (Line 5). SCC decomposition can be accom¬ 
plished in 2 DPS, in time 0(n -k m). As analyzed in Section]^ 
Greedy invokes Imax times PN-path, and each PN-path needs 2 
BPS (l-Subgraph) and 1 DPS (remove/reverse pn-paths). Since 
Imax is small (< 100 in our extensive experiments), the time com¬ 
plexity of Greedy is 0{n -k m). Regarding moving cycles from 
Gi — lAiGi) folAiGi), it is equivalent to moving cycles from non¬ 
trivial SCCs of Gi — lA{Gi) to lA{Gi). Based on the fact that 
Gi — hl{Gi) is near acyclic, there are a few cycles in G; —U{Gi), 
cycle moving is in 0{n -k m). The time complexity of Refine, as 
given in Section lS^ is 0{cm^), because most PindNC (G, u) relax 
edges along a path with a few branches and vertices u will have 
dst{u) updated less than \E{U{G))\ — \E{JA{G))\ times. 

6. PERFORMANCE STUDIES 

We conduct extensive experiments to evaluate two proposed GR- 
U algorithms. One is GR-U-D using Greedy-D (Algorithm |5j and 
Refine (Algorithm [9j. and the other is GR-U-R using Greedy-R 
(Algorithm]^ and Refine (Algorithm]^. We do not compare our 
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Graph 

nr 

\E\ 

|V{W(6))| 

\E(U(G))\ 

wiki-Vote 

7,115 

103,689 

1,286 

17,676 

Gnutella 

62.586 

147.892 

11,952 

18,964 

Epinions 

75,879 

508,837 

33,673 

264,995 

Slashdot()8ll 

77,360 

828,159 

70,849 

734,021 

Slashdot()9{)2 

82,168 

870,159 

71,833 

748,580 

web-NotreDame 

325,729 

1,469,679 

99,120 

783,788 

web-Stanford 

281,903 

2,312,497 

211,883 

691,521 

amazon 

403,394 

3,387,388 

399,702 

1,973,965 

Wiki-Talk 

2,394,385 

5,021,410 

112,030 

1,083,509 

web-Google 

875,713 

5,105.039 

461,381 

1,841,215 

web-BerkStan 

685,230 

7,600,595 

478,774 

2,068,081 

Pokec 

1,632,803 

30,622,560 

1,297,362 

20,911,934 


Table 3: Summary of real Datasets 


Graph 

Refine 

GR-U-D 

Refine 

GR-U-R 

DS-U 

c 

wiki-Vote 

0.1 

0.1 

0.1 

0.1 

1.0 

0.100 

Gnutella 

0.5 

0.5 

0.4 

0.4 

1.6 

0.250 

Epinions 

15.9 

16.1 

15.2 

15.4 

414.4 

0.037 

SlashdotOSll 

80.6 

80.8 

70.9 

71.0 

12,748.6 

0.006 

Slashdot0902 

87.3 

87.5 

76.6 

76.8 

14,324.5 

0.005 

web-NotreDame 

2.6 

3.0 

2.4 

2.7 

370.4 

0.007 

web-Stanford 

21.5 

25.7 

16.7 

24.9 

2,780.0 

0.009 

amazon 

126.5 

133.5 

124.8 

130.5 

44,865.0 

0.003 

Wiki-Talk 

504.3 

504.9 

487.3 

487.9 

9,120.1 

0.053 

web-Google 

100.2 

110.3 

78.6 

84.6 

35,271.7 

0.002 

web-BerkStan 

129.7 

137.7 

67.8 

75.9 

7,853.9 

0.010 

Pokec 

30,954.5 

30,983.7 

30,120.4 

30,140.5 

- 

- 

Gplus2 

363.5 

364.2 

360.5 

361.2 

39,083.8 

0.009 

WeiboO 

206.5 

207.3 

202.4 

203.3 

8,004.6 

0.025 


Table 4: Efficiency of GR-U-D, GR-U-R and DS-U 


algorithms with BF-U in Ha, because BF-U is in 0{nm^) and 
is too slow. We use our DS-U as the baseline algorithm, which is 
0{rn?). We show that Greedy produces an answer which is very 
close the the exact answer. In order to confirm Greedy is of time 
complexity 0(n + m), we show the largest iteration Imax used in 
Greedy is a small constant by showing that the longest pn-path (the 
S&mS SS Ifnax ) deleted/reversed by Greedy is small. In addition, 
we confirm the constant c of 0(c ■ m^) for Refine is very small 
by showing statistics of Q, W, and k-cycles. We also confirm fhe 
scalability of GR-U as well as Greedy and Refine. 

All these algorithms are implemented in C++ and complied by 
gcc 4.8.2, and tested on machine with 3.40GHz Intel Core i7-4770 
CPU, 32GB RAM and running Linux. The time unit used is second. 

Datasets: We use 14 real datasets. Among the datasets, Epin- 
ions, wiki-Vote, SlashdotOSll, Slashdot0902, Pokec, Google+, and 
Weibo are social networks; web-NotreDame, web-Stanford, web- 
Google, and web-BerkStan are web graphs; Gnutella is a peer-to- 
peer network; amazon is a product co-purchasing network; and 
Wiki-Talk is a communication network. All the datasets are down¬ 
loaded from Stanford large network dataset collection (http : / / sr 
except for Google+ and Weibo. The detailed information of the 
datasets are summarized in Table [T] and Table In the tables, for 
each graph, the 2nd and 3rd columns show the numbers of vertices 
and edgefl respectively, and the 4th and 5th columns show the 
numbers of vertices and edges of its maximum Eulerian subgraph, 
respectively. 

Efficiency: Table|4]shows the efficiency of these three algorithms, 
i.e., GR-U-D, GR-U-R, and DS-U, over 14 real datasets. For GR- 
U-D, the 2nd column shows the tunning time of Refine and the 3rd 
column shows the total running time of GR-U-D. As can be seen, 
for GR-U-D, the running time of Refine dominates that of Greedy- 
D. The 4th and 5th columns show the running time of Refine and 
the total running time of GR-U-R, respectively. Likewise, the Re¬ 
fine algorithm is the most time-consuming procedure in GR-U-R. 

^for each dataset, we delete all self-loops if exist. 



Figure 18: IE(U(G))I/IE(U(G))I 


Graph 

IRD 

ISD % 

IRR 

ISR% 

IR DSU 

wiki-Vote 

659 

95.4 

629 

95.6 

14,361 

Gnutella 

2,504 

69.5 

1.410 

82.8 

8,202 

Epinions 

5,466 

97.4 

5.334 

97.4 

207,124 

SlashdotOSl 1 

11,464 

97.9 

9,990 

98.2 

541,970 

Slashdot0902 

12,036 

97.8 

10,426 

98.1 

554,163 

web-NotreDame 

9,030 

98.1 

6,119 

98.7 

486.240 

web-Stanford 

23,427 

94.8 

15,721 

96.5 

448,960 

amazon 

75,104 

94.1 

61,818 

95.2 

1,282,326 

Wiki-Talk 

37,662 

95.7 

36,139 

95.9 

871,020 

web-Google 

90,375 

92.4 

59,387 

95.0 

1,196,616 

web-BerkStan 

69,078 

95.2 

41,703 

97.1 

1,437,188 

Pokec 

686,765 

- 

635,286 

- 

- 

Gplu,s2 

18,766 

96.9 

18,721 

96.9 

613,008 

WeiboO 

25,991 

96.2 

24,550 

96.4 

686.765 


Table 5: The numbers of Iterations 


It is important to note that both GR-U-D and GR-U-R significantly 
outperform DS-U. In most large datasets, GR-U-Dand GR-U-R are 
two orders of magnitude faster than DS-U. For instance, in web- 
Stanford dataset, GR-U-R takes 25 seconds to find the maximum 
Eulerian subgraph, while DS-U takes 2,780 seconds, which is more 
than 100 times slower. In addition, it is worth mentioning that in 
Pokec dataset, DS-U cannot get a solution in two weeks. In the 
6th column, c is the c value in Refine’s time complexity O(cm^), 
by comparing running time of GR-U-R and DS-U. In all graphs, 
c -C 1. Note BF-U is very slow, for example, BF-U takes more 
than 30,000 seconds to handle the smallest dataset wiki-Vote, while 
our GR-U takes only 0.1 second. 

Effectiveness of Greedy: To evaluate the effectiveness of the greedy 
algorithms, we first study the size of Eulerian subgraph obtained 
by Greedy-D and Greedy-R. Fig.[T^depicts the results. In Fig. 1181 
\E{U{G)) \ denotes the size of Eulerian subgraph obtained by the 
greedy algorithms, \E{U{G))\ denotes the size of the maximum 
Eulerian subgraph, and \E{JA{G))\/\E{U{G))\ denotes the ratio 
between them. The ratios obtained by both Greedy-D and Greedy- 
R are very close to 1 in most datasets. That is to say, both Greedy-D 
and Greedy-R can get a near-maximum Eulerian subgraph, indi¬ 
cating that both Greedy-D and Greedy-R are very effective. The 
is slightly better than that of Greedy-D, 
which supports our analysis. The ratio of Gnutella dataset using 
Greedy-D is slightly lower than others. One possible reason is that 
Gnutella is much sparser than other datasets, thus some inappropri¬ 
ate pn-path deletions may result in enlarging other pn-paths, and 
this situation can be largely relieved in Greedy-R. 

Second, we investigate the numbers of iterations used in GR-U- 
D, GR-U-R, and DS-U. Table|5]reports the results. In Table|5] the 
2nd and 4th columns ‘IRD’ and TRR’ denote the numbers of iter¬ 
ations used in the refinement procedure (i.e., Refine, Algorithm|9]l 
of GR-U-D and GR-U-R, respectively. The last column TR_DSU’ 
reports the total number of iterations used in DS-U. From these 
columns, we can see that in large graphs (e.g., web-NotreDame 
dataset), the numbers of iterations used in Refine of GR-U-D and 
GR-U-R are at least two orders of magnitude smaller than those 
used in DS-U. In addition, it is worth mentioning that in Pokec 
dataset, DS-U cannot get a solution in two weeks. The 3rd and 5th 
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Graph 

\E(U(G))\ 

\Eig)\ 

w 

wiki-Vote 

17.676 

3,214 

20 

Gnutella 

18.964 

6,906 

10 

Epinions 

264,995 

30,997 

129 

Slashdot081l 

734.021 

45,315 

118 

Slashdot0902 

748.580 

46,830 

145 

web-NotreDame 

783,788 

10,439 

3,963 

web-Stanford 

691,521 

35,402 

6,168 

amazon 

1,973,965 

202,513 

12,994 

Wiki-Talk 

1,083.509 

158,848 

331 

web-Google 

1,841,215 

149,425 

22,361 

web-BerkStan 

2,068,081 

105,569 

16,991 

Pokec 

20,911.934 

3,003,797 

8,964 

Gplus2 

770,854 

117,641 

80 

weiboO 

850,136 

124,395 

384 


Table 6: Statistics of \G\ and W 



23456789 >=10 23456789 >=10 


(a) Epinions (b) web-Stanford 

Figure 19: Distributions of k-cycles for each k 

columns report the percentages of iterations saved by GR-U-D and 
GR-U-R, respectively. Both Greedy-D and Greedy-R can reduce at 
least 95% iterations in most datasets. Similarly, the results obtained 
by GR-U-R are slightly better than those obtained by GR-U-D. 

The largest iteration Imax ■ We show the largest iteration Imax in 
Greedy by showing the longest pn-paths deleted/reversed, which is 
the numbers of PN-path-D/PN-path-R invoked by Greedy-D/Greedy- 
R using the real datasets. Below, the first/second number is the 
longest pn-paths deleted/reversed, wiki-Vote (9/9), Gnutella (29/22), 
Epinions (12/10), SlashdotOSll (6/6), Slashdot0902 (8/8), web- 
NotreDame (96/41), web-Stanford (275/221), amazon (57/37), Wiki- 
Talk (9/7), web-Google (93/37), web-BerkStan (123/85), Pokec 
(14/13), Gplus2 (9/8), and WeiboO (12/10). The longest pn-paths 
deleted or reversed are always of small size, especially compared 
with \E\. Therefore, the time complexity of Greedy can be re¬ 
garded as 0(n -I- m). 

The support to a small c: We show the support that c given in 
0{cm?) for Refine is small by giving statistics of G, W, and k- 
cycles. We first show the statistics of \G\ (= Gp © Gjv) and W 
discussed in Section [53] Table [^reports the results. From Table[6l 
we can find that for each graph, \E{G) \ and W are small compared 
with \E{U{G))\. These results confirm our theoretical analysis in 
Section B©] Second, we study the statistics of k-cycles. The results 
of Epinions and web-Stanford datasets are depicted in Fig.[T3 and 
similar results can be observed from other datasets. In Fig. [T^ y- 
axis denotes the ratio between the total weights of p-edges and the 
total weights of n-edges (i.e., A^/A^ defined in Section 1531 , and 
the x-axis denotes k for k-cycles, where k — 2,3, ■ ■ ■ , >= 10. As 
can be seen, for all k-cycles, the ratios are always smaller than 2 
in both Epinions and web-Stanford datasets. These results confirm 
our analysis in Section ISA] 

Scalability; We test the scalability for GR-U-R, GR-U-D, and DS- 
U. We report the results for web-NotreDame and web-Stanford in 
Fig. 1^ Similar results are observed for other real datasets. To 
test the scalability, we sample 10 subgraphs starting from 10% of 
edges, up to 100% by 10% increments. Fig. |20(a)] and Fig. |20(b)'] 
show both GR-U-R and GR-U-D scale well. For web-NotreDame, 
we further show the performance of Greedy and Refine in Fig. |20(c)] 
and Fig.|20(d)| In Fig.|20(c)l Greedy seems to be not really linear. 






Figure 20: Scalability: web-NotreDame and web-Stanford 


We explain the reason below. Revisit Algorithm |4] the efficiency 
of Greedy is mainly determined by two factors, the graph size (or 
more precisely the size of the largest SCC) and the number of times 
invoking PN-path (i.e. Imax). When a subgraph is sparse, both SCC 
size and Imax tend to be small (the smallest sample graph with 10% 
edges contains a largest SCC with 1,155 vertices and 4,317 edges, 
and Imax ~ 30/16 for Greedy-D/Greedy-R), whereas, both the 
size of the largest SCC and Imax tend to be large in dense subgraphs 
(the entire graph contains a largest SCC with 53,968 vertices and 
296,228 edges, and Imax = 96/41 for Greedy-D/Greedy-R). 


7. CONCLUSION 

In this paper, we study social hierarchy computing to find a so¬ 
cial hierarchy Gd as DAG from a social network represented as 
a directed graph G. To find Gd, we study how to find a maxi¬ 
mum Eulerian subgraph U{G) of G such that G = U(G) U Gd. 
We justify our approach, and give the properties of Gd and the 
applications. The key is how to compute U{G). We propose a 
DS-U algorithm to compute IA{G), and develop a novel two-phase 
Greedy-(&-Refine algorithm, which greedily computes an Eulerian 
subgraph and then refines this greedy solution to find the maxi¬ 
mum Eulerian subgraph. The quality of our greedy approach is 
high which can be used to support social mobility and recover the 
hidden directions. We conduct extensive experiments to confirm 
the efficiency of our Greedy-ife-Refine approach. 


12 ] 

13] 

14] 

15] 

16] 

17] 

18] 




, and A. ParandehGheibi. Spread of (mis) 
information in social networks. Games and Economic Behavior, 
70(2), 2010. 

J. A. Almendral, L. Lopez, and M. A. Sanjuan. Information flow in 
generalized hierarchical networks. PhysicaA: Statistical Mechanics 
and its Applications, 324(1), 2003. 

B. Ball and M. E. Newman. Friendship networks and social status. 
Network Science, 1(01), 2013. 

L. Cai and B. Yang. Parameterized complexity of evenodd subgraph 
problems. Journal of Discrete Algorithms, 9(3), 2011. 

P. A. Catlin. Supereulerian graphs: a survey. Journal of Graph 
theory, 16(2), 1992. 

W. Chen, L. V. Lakshmanan, and C. Castillo. Information and 
Influence Propagation in Social Networks. Morgan & Claypool, 
2013. 


Z. Chen and H. Lai. Reduction techniques for supereulerian graphs 
and related topics: a survey. Combinatorics and graph theory, 95, 
1995. 


A. Clauset, C. Moore, and M. E. Newman. Hierarchical structure and 
the prediction of missing links in networks. Nature, 453(7191), 2008. 


14 









































[9] H. Fleischner. Eulerian graphs and related topics, volume 1. North 
Holland, 1990. 

[10] H. Fleischner. (some of) the many uses of eulerian graphs in graph 
theory (plus some applications). Discrete Mathematics, 230(1), 2001. 

[11] R. V. Gould. The origins of status hierarchies: A formal theory and 
empirical testl. American journal of sociology, 107(5), 2002. 

[12] M. S. Granovetter. The strength of weak ties. American journal of 
sociology, 1973. 

[13] M. Gupte, R Shankar, J. Li, S. Muthukrishnan, and L. Iftode. Finding 
hierai'chy in directed online social networks. In Proc. of WWW'11, 
2011 . 

[14] V. Guruswami, R. Manokaran, and P. Raghavendra. Beating the 
random ordering is hard: Inapproximability of maximum acyclic 
subgraph. In Proc. of FOCS'08, 2008. 

[15] J. M. Kleinberg. Authoritative sources in a hyperlinked environment. 
Journal of the ACM (JACM), 46(5), 1999. 

[16] H. Kwak, C. Lee, H. Park, and S. Moon. What is twitter, a social 
network or a news media? In Proc. of WWW’ 10, 2010. 

[17] J. Leskovec and C. Faloutsos. Sampling from large graphs. In Proc. 
ofKDD'06, 2006. 

[18] J. Leskovec, D. Huttenlocher, and J. Kleinberg. Predicting positive 
and negative links in online social networks. In Proc. of]VWW'10, 
2010 . 

[19] J. Leskovec, D. Huttenlocher, and J. Kleinberg. Signed networks in 
social media. In Proc. of CHEW, 2010. 

[20] D. Li, D. Li, and J. Mao. On maximum number of edges in a 
spanning eulerian subgraph. Discrete mathematics, 274(1), 2004. 

[21] D. Liben-Nowell and J. Kleinberg. The link-prediction problem for 
social networks. Journal of the American society for information 
science and technology, 58(7), 2007. 

[22] A. S. Maiya and T. Y. Berger-Wolf. Infening the maximum 
likelihood hierarchy in social networks. In Proc. ofCSE’09, 2009. 

[23] R. E. Tarjan. Amortized computational complexity. SIAM Journal on 
Algebraic Discrete Methods, 6(2), 1985. 

[24] J. Zhang, B. Liu, J. Tang, T. Chen, and J. Li. Social influence locality 
for modeling retweeting behaviors. In Proc. of IJCAE13, 2013. 

[25] J. Zhang, C. Wang, and J. Wang. Who proposed the relationship?: 
recovering the hidden directions of undirected social networks. In 
Proc. of WWW'14, 2014. 

[26] N. Zhenqiang Gong, A. Talwalkar, L. Mackey, L. Huang, E. C. R. 
Shin, E. Stefanov, D. Song, et al. Jointly predicting links and 
infening attributes using a social-attribute network (SAN). ACM 
Workshop on Social Network Mining and Analysis, 2011. 

[27] N. Zhenqiang Gong, W. Xu, L. Huang, P. Mittal, E. Stefanov, 

V. Sekar, and D. Song. Evolution of social-attribute networks: 
Measurements, modeling, and implications using google+. 
ACM/USENIX Internet Measurement Conference, 2012. 


15 


